Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 7.00 12.00 16.12 20.50 104.30
summary(prt.22$`Daily Mean PM2.5 Concentration`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.200 4.200 7.000 8.573 10.900 302.500
It doesn’t make sense for a minimum of -2.2 PM2.5 concentration, so I will subset prt.22 to include data for PM2.5 >0.
prt.22<- prt.22[prt.22$`Daily Mean PM2.5 Concentration`>=0]
Step 2: Combine Data
data_combined <-rbindlist(list( prt.02[, year :=2002], prt.22[, year :=2022]))setnames(data_combined, c("Daily Mean PM2.5 Concentration", "SITE_LATITUDE", "SITE_LONGITUDE", "Site Name"), c("PM2.5", "Lat", "Lon", "SiteName"))head(data_combined)
Date Source Site ID POC PM2.5 UNITS DAILY_AQI_VALUE SiteName
1: 01/05/2002 AQS 60010007 1 25.1 ug/m3 LC 78 Livermore
2: 01/06/2002 AQS 60010007 1 31.6 ug/m3 LC 92 Livermore
3: 01/08/2002 AQS 60010007 1 21.4 ug/m3 LC 71 Livermore
4: 01/11/2002 AQS 60010007 1 25.9 ug/m3 LC 80 Livermore
5: 01/14/2002 AQS 60010007 1 34.5 ug/m3 LC 98 Livermore
6: 01/17/2002 AQS 60010007 1 41.0 ug/m3 LC 115 Livermore
DAILY_OBS_COUNT PERCENT_COMPLETE AQS_PARAMETER_CODE AQS_PARAMETER_DESC
1: 1 100 88101 PM2.5 - Local Conditions
2: 1 100 88101 PM2.5 - Local Conditions
3: 1 100 88101 PM2.5 - Local Conditions
4: 1 100 88101 PM2.5 - Local Conditions
5: 1 100 88101 PM2.5 - Local Conditions
6: 1 100 88101 PM2.5 - Local Conditions
CBSA_CODE CBSA_NAME STATE_CODE STATE
1: 41860 San Francisco-Oakland-Hayward, CA 6 California
2: 41860 San Francisco-Oakland-Hayward, CA 6 California
3: 41860 San Francisco-Oakland-Hayward, CA 6 California
4: 41860 San Francisco-Oakland-Hayward, CA 6 California
5: 41860 San Francisco-Oakland-Hayward, CA 6 California
6: 41860 San Francisco-Oakland-Hayward, CA 6 California
COUNTY_CODE COUNTY Lat Lon year
1: 1 Alameda 37.68753 -121.7842 2002
2: 1 Alameda 37.68753 -121.7842 2002
3: 1 Alameda 37.68753 -121.7842 2002
4: 1 Alameda 37.68753 -121.7842 2002
5: 1 Alameda 37.68753 -121.7842 2002
6: 1 Alameda 37.68753 -121.7842 2002
Step 3: Basic Map
library(leaflet)leaflet(data_combined) %>%addTiles() %>%addCircleMarkers(lng =~Lon,lat =~Lat,radius =1, color =~ifelse(year ==2002, "red", "yellow"), weight =2, opacity =0.1,popup =~SiteName, label ="Map of Sites Measured in 2002(red) and 2022 (yellow)")
Markers are highly concentrated in the major regions of California - Sacramento, Bay Area, and Los Angeles/San Diego. There is also significant coverage of the rest of the state, with distributed sites all over. There appear to be more sites in 2022 compared to 2002, due to many more yellow markers present compared to the red.
Step 4: Checking for missing/implausible values of PM2.5 in combined dataset.
sum(is.na(data_combined$PM))
[1] 0
head(data_combined$PM2.5)
[1] 25.1 31.6 21.4 25.9 34.5 41.0
tail(data_combined$PM2.5)
[1] 3.4 3.8 6.0 34.8 23.2 1.0
summary(data_combined$PM2.5)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 4.60 7.70 10.24 12.50 302.50
Data has been cleared of implausible/missing values. Observations with NA or implausible PM2.5 have been removed from the dataset.
Step 5: 3 different spatial levels for comparing daily concentrations of PM2.5 in CA from 2002 to 2022.
State-wide Data:
library(ggplot2)average_pm_by_year <- data_combined %>%group_by(year) %>%summarize(Average_PM =mean(PM2.5, na.rm =TRUE),SD_PM =sd(PM2.5, na.rm =TRUE) )ggplot(average_pm_by_year, aes(x =as.factor(year), y = Average_PM)) +geom_bar(stat ="identity", fill ="blue") +geom_errorbar(aes(ymin = Average_PM - SD_PM, ymax = Average_PM + SD_PM),width =0.2, position =position_dodge(width =0.9)) +labs(title ="Average PM2.5 Level in California by Year (2002-2022)", x ="Year", y ="Average PM2.5 Level")
t_test_state <-t.test(prt.02$`Daily Mean PM2.5 Concentration`, prt.22$`Daily Mean PM2.5 Concentration`, paired =FALSE)t_test_state
Welch Two Sample t-test
data: prt.02$`Daily Mean PM2.5 Concentration` and prt.22$`Daily Mean PM2.5 Concentration`
t = 65.583, df = 18898, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.283791 7.732587
sample estimates:
mean of x mean of y
16.115943 8.607754
On the state-wide level, there was a decrease in the average PM2.5 concentration from 2002 to 2022, but that decrease was not statistically significant.
Column 2 ['Average_PM_2022'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for backwards compatibility. See news item 5 in v1.12.2 for options to control this message.
Column 2 ['Average_PM_2022_site'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for backwards compatibility. See news item 5 in v1.12.2 for options to control this message.
color_palette <-colorNumeric(palette ="viridis", domain = Site_mean$Average_PM_2002_site)temp.pal02.s <-colorNumeric(c('darkgreen','goldenrod','brown'), domain=average_pm_by_site_02$Average_PM_2002_site)PMmap02.s <-leaflet(average_pm_by_site_02) %>%addProviderTiles('CartoDB.Positron') %>%addCircles(lat =~Lat, lng=~Lon,label =~paste0(round(average_pm_by_site_02$Average_PM_2002_site,2), ' PM2.5'), color =~temp.pal02.s(average_pm_by_site_02$Average_PM_2002_site),opacity =1, fillOpacity =1, radius =500 ) %>%addLegend('bottomleft', pal=temp.pal02.s, values=average_pm_by_site_02$Average_PM_2002_site,title='Mean Concentrations PM2.5 by site in 2002', opacity=1)PMmap02.s
temp.pal22.s <-colorNumeric(c('darkgreen','goldenrod','brown'), domain=average_pm_by_site_22$Average_PM_2022_site)PMmap22.s <-leaflet(average_pm_by_site_22) %>%addProviderTiles('CartoDB.Positron') %>%addCircles(lat =~Lat, lng=~Lon,label =~paste0(round(average_pm_by_site_22$Average_PM_2022_site,2), ' PM2.5'), color =~temp.pal22.s(average_pm_by_site_22$Average_PM_2022_site),opacity =1, fillOpacity =1, radius =500 ) %>%addLegend('bottomleft', pal=temp.pal22.s, values=average_pm_by_site_22$Average_PM_2022_site,title='Mean Concentrations PM2.5 by site in 2022', opacity=1)PMmap22.s
Decrease overall in average PM2.5 concentrations by site from 2002 to 2022.